Information retrieval for multivariate research data repositories

نویسنده

  • Maximilian Scherer
چکیده

In this dissertation, I tackle the challenge of information retrieval for multivariate research data by providing novel means of content-based access. Large amounts of multivariate data are produced and collected in different areas of scientific research and industrial applications, including the human or natural sciences, the social or economical sciences and applications like quality control, security and machine monitoring. Archival and re-use of this kind of data has been identified as an important factor in the supply of information to support research and industrial production. Due to increasing efforts in the digital library community, such multivariate data are collected, archived and often made publicly available by specialized research data repositories. A multivariate research data document consists of tabular data with m columns (measurement parameters, e.g., temperature, pressure, humidity, etc.) and n rows (observations). To render such data-sets accessible, they are annotated with meta-data according to well-defined meta-data standard when being archived. These annotations include time, location, parameters, title, author (and potentially many more) of the document under concern. In particular for multivariate data, each column is annotated with the parameter name and unit of its data (e.g., water depth [m]). The task of retrieving and ranking the documents an information seeker is looking for is an important and difficult challenge. To date, access to this data is primarily provided by means of annotated, textual meta-data as described above. An information seeker can search for documents of interest, by querying for the annotated meta-data. For example, an information seeker can retrieve all documents that were obtained in a specific region or within a certain period of time. Similarly, she can search for datasets that contain a particular measurement via its parameter name or search for data-sets that were produced by a specific scientist. However, retrieval via textual annotations is limited and does not allow for content-based search, e.g., retrieving data which contains a particular measurement pattern like a linear relationship between water depth and water pressure, or which is similar to example data the information seeker provides. In this thesis, I deal with this challenge and develop novel indexing and retrieval schemes, to extend the established, meta-data based access to multivariate research data. By analyzing and indexing the data patterns occurring in multivariate data, one can support new techniques for content-based retrieval and exploration, well beyond meta-data based query methods. This allows information seekers to query for multivariate data-sets that exhibit patterns similar to an example data-set they provide. Furthermore, information seekers can specify one or more particular patterns they are looking for, to retrieve multivariate data-sets that contain similar patterns. To this end, I also develop visual-interactive techniques to support information seekers in formulating such queries, which inherently are more complex than textual search strings. These techniques include providing an over-view of potentially interesting

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Research Objects to Research Networks: Combining Spatial and Semantic Search

The spatial and semantic discovery of research objects extracted from sources available on the Web can be enabled with georeferenced and annotated metadata. Constraints on data retrieval are based on the types of queries and services that current repositories offer, which contribute to their limited usability. We address these constraints by illustrating a framework for a linked research networ...

متن کامل

Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016

Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology an...

متن کامل

قابلیت نظام‌های اطلاعات بیمارستانی کشور جهت استقرار پزشکی مبتنی بر شواهد

 Background and Aim: Evidence Based Medicine (EBM) is the explicit use of current best evidence in making decisions about the care of individual patients. Hospital information system (HIS) can act as a bridge between medical data and medical knowledge through merging of patient's data, individual clinical knowledge and external evidences .The aim of this research was to determine the Capab...

متن کامل

Survey on Improving Genetic Algorithm Using Searching Concepts of Data Structures for Query Optimization in Information Retrieval

Information is an ultimate resource of every commercial and non commercial sector. With the advent of new technologies and popularize use of web, the information has been increasing at greater pace. The organization of this information into unstructured database repositories has made the Information Retrieval, a complex process. The use of Genetic Algorithm for retrieving the information from s...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013